perf(utils): linear scan `is_uri` #2648

jamestrew · 2023-08-11T01:05:51Z

Benchmarked against 80k+ unique paths, both Windows and Linux (macos should be close enough to llinux).

Approximately 13% faster for linux paths and 55% faster for windows paths.

hyperfine 'luajit current_linux.lua' 'luajit fast_linux.lua' --warmup 10
Benchmark 1: luajit current_linux.lua
  Time (mean ± σ):      25.7 ms ±   0.8 ms    [User: 22.7 ms, System: 3.4 ms]
  Range (min … max):    24.0 ms …  28.8 ms    87 runs

Benchmark 2: luajit fast_linux.lua
  Time (mean ± σ):      22.7 ms ±   0.8 ms    [User: 19.4 ms, System: 3.6 ms]
  Range (min … max):    21.4 ms …  26.0 ms    99 runs

Summary
  luajit fast_linux.lua ran
    1.13 ± 0.05 times faster than luajit current_linux.lua

hyperfine 'luajit current_win.lua' 'luajit fast_win.lua' --warmup 10
Benchmark 1: luajit current_win.lua
  Time (mean ± σ):      36.3 ms ±   0.9 ms    [User: 34.1 ms, System: 2.7 ms]
  Range (min … max):    34.7 ms …  38.9 ms    69 runs

Benchmark 2: luajit fast_win.lua
  Time (mean ± σ):      23.5 ms ±   0.7 ms    [User: 21.1 ms, System: 2.9 ms]
  Range (min … max):    22.4 ms …  25.7 ms    95 runs

Summary
  luajit fast_win.lua ran
    1.55 ± 0.06 times faster than luajit current_win.lua

juntuu · 2023-08-23T06:05:02Z

Hi, just bumped into this while browsing.

I'm not at all familiar with the project, so can't say if the performance improvement would outweigh the complexity cost, and what would be the effect in real workloads.

However, if I'm not mistaken this change introduces two differences in the matching:

does not require a : in uri (is_uri("hello") == true)
the windows "drive letter" can be more than one letter (is_uri("hello:\\") == false)

Because the function is checking for uri and not windows path, I think 2. is more correct now. Uri should not contain \ anyway.

But 1. seems like a bug.

I think the following (pseudocode) should fix 1. while keeping 2.:

if filename[1] not in [a-zA-Z] then
  return false
end
for i = 2, #filename do
  if filename[i] == ':' then
    return filename[i+1] ~= '\\'
  elseif filename[i] not in [a-zA-Z0-9.+-] then
    return false
  end
end
return false

max397574 · 2023-08-23T06:14:08Z

@juntuu that pseudocode won't work afaict since there is no chance the elseif statement ever is true
it would be true if the character is a colon but if that's the case it will already return false because of the if statement

juntuu · 2023-08-23T06:20:02Z

that pseudocode won't work

Oops, good catch. Thanks!

I flipped the conditions now, so the : is checked first.

jamestrew · 2023-08-26T05:14:57Z

Good catches to both!
I guess under ideal conditions, something like is_uri('hello') or really anything that's not a full path shouldn't happen but for the sake of correctness, I like the suggested change.

No significant performance difference between my original implementation and the suggested change.

$ hyperfine 'luajit before_linux.lua' 'luajit after_linux.lua' --warmup 10 --runs 100
Benchmark 1: luajit before_linux.lua
  Time (mean ± σ):      27.2 ms ±   0.5 ms    [User: 22.6 ms, System: 4.5 ms]
  Range (min … max):    26.4 ms …  29.9 ms    100 runs

Benchmark 2: luajit after_linux.lua
  Time (mean ± σ):      27.5 ms ±   0.8 ms    [User: 22.8 ms, System: 4.7 ms]
  Range (min … max):    26.0 ms …  30.5 ms    100 runs

Summary
  luajit before_linux.lua ran
    1.01 ± 0.03 times faster than luajit after_linux.lua

$ hyperfine 'luajit before_win.lua' 'luajit after_win.lua' --warmup 10 --runs 100
Benchmark 1: luajit before_win.lua
  Time (mean ± σ):      26.9 ms ±   0.5 ms    [User: 23.6 ms, System: 3.6 ms]
  Range (min … max):    26.0 ms …  28.8 ms    100 runs

Benchmark 2: luajit after_win.lua
  Time (mean ± σ):      25.9 ms ±   0.7 ms    [User: 22.9 ms, System: 3.3 ms]
  Range (min … max):    24.8 ms …  28.9 ms    100 runs

Summary
  luajit after_win.lua ran
    1.04 ± 0.03 times faster than luajit before_win.lua

jamestrew · 2023-08-28T04:26:05Z

I'm going to go ahead and merge this in. I think considering this is a pretty hot function, the performance gain, particularly on Windows is worth the diminished readability.

(cherry picked from commit 1dfa66b)

jamestrew mentioned this pull request Aug 11, 2023

fix: handle windows file paths as uris #2640

Merged

4 tasks

jamestrew force-pushed the is_uri_cleanup branch from 2d6b82a to cc9193e Compare August 26, 2023 05:06

jamestrew force-pushed the is_uri_cleanup branch from cc9193e to 8ea8b04 Compare August 27, 2023 21:40

perf(utils): linear scan is_uri

a2d7603

jamestrew force-pushed the is_uri_cleanup branch from 8ea8b04 to a2d7603 Compare August 28, 2023 04:22

jamestrew merged commit 1dfa66b into master Aug 28, 2023
13 checks passed

jamestrew deleted the is_uri_cleanup branch August 28, 2023 04:26

Conni2461 pushed a commit that referenced this pull request Sep 5, 2023

perf(utils): linear scan is_uri (#2648)

30ba3db

(cherry picked from commit 1dfa66b)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(utils): linear scan `is_uri` #2648

perf(utils): linear scan `is_uri` #2648

jamestrew commented Aug 11, 2023

juntuu commented Aug 23, 2023 •

edited

Loading

max397574 commented Aug 23, 2023 •

edited

Loading

juntuu commented Aug 23, 2023

jamestrew commented Aug 26, 2023

jamestrew commented Aug 28, 2023

perf(utils): linear scan is_uri #2648

perf(utils): linear scan is_uri #2648

Conversation

jamestrew commented Aug 11, 2023

juntuu commented Aug 23, 2023 • edited Loading

max397574 commented Aug 23, 2023 • edited Loading

juntuu commented Aug 23, 2023

jamestrew commented Aug 26, 2023

jamestrew commented Aug 28, 2023

perf(utils): linear scan `is_uri` #2648

perf(utils): linear scan `is_uri` #2648

juntuu commented Aug 23, 2023 •

edited

Loading

max397574 commented Aug 23, 2023 •

edited

Loading